WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia

نویسندگان

  • Daniel Hewlett
  • Alexandre Lacoste
  • Llion Jones
  • Illia Polosukhin
  • Andrew Fandrianto
  • Jay Han
  • Matthew Kelcey
  • David Berthelot
چکیده

We present WIKIREADING, a large-scale natural language understanding task and publicly-available dataset with 18 million instances. The task is to predict textual values from the structured knowledge base Wikidata by reading the text of the corresponding Wikipedia articles. The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs). We compare various state-of-the-art DNNbased architectures for document classification, information extraction, and question answering. We find that models supporting a rich answer space, such as word or character sequences, perform best. Our best-performing model, a word-level sequence to sequence model with a mechanism to copy out-of-vocabulary words, obtains an accuracy of 71.8%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Question Answering for Long Documents

Reading an article and answering questions about its content is a fundamental task for natural language understanding. While most successful neural approaches to this problem rely on recurrent neural networks (RNNs), training RNNs over long documents can be prohibitively slow. We present a novel framework for question answering that can efficiently scale to longer documents while maintaining or...

متن کامل

WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making

We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a web site, which is represented as a graph consisting of web pages as nodes and hyperlinks as directed edges, to find a web page in which a query appears. The agent is ...

متن کامل

DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus

The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated W...

متن کامل

The Links Have It: Infobox Generation by Summarization over Linked Entities

Online encyclopedia such as Wikipedia has become one of the best sources of knowledge. Much effort has been devoted to expanding and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Although remarkable progresses have been made, their effectiveness and efficiency is still limited as they try to tackle an extremely difficult natural language ...

متن کامل

Iranian EFL Learners’ Motivational Fluctuation in Task Performance over Different Timescales

Motivation for learning a new language is both self and time-oriented. The language learner’s motivation experiences gradual fluctuation over time and the view of oneself is different on each timescale of the study. Interaction among different timescales throughout the Second Language Development (SLD) is a novel area of investigation (de Bot, 2015). In order to probe this interactive nature, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1608.03542  شماره 

صفحات  -

تاریخ انتشار 2016